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Abstract 

In this paper, we consider stochastic optimal control of Markov Jump Linear Systems with state 
feedback but without observation of the jumping parameter. The proposed control law is assumed to 
be linear with constant gains that can be obtained from the necessary optimality conditions using an 
iterative algorithm. The proposed approach is demonstrated in a numerical example. 


1. Introduction 

Since their introduction by Krasovskii and Lidskii in 1969 [1, 2, 3], Markov Jump Linear Systems 
(MJLS) have received a considerable amount of interest. This is due to their ability to capture 
systems whose dynamics are subject to abrupt changes that are not independently distributed. MJLS 
modeling approach is used, e.g., in networked control [4, 5], economics [6, 7], or control of systems 
with component failures [8]. 

Most works that consider control of MJLS assume availability of the jumping parameter or mode 
that models the abrupt model switching. This assumption allows to derive optimal control laws in 
continuous [9] and discrete time [10, 11] for systems with state feedback. For measurement-feedback 
case, mode availability guarantees that the separation between control and estimation holds. Thus, 
the optimal control law consists of an optimal linear regulator and an optimal Kalman filter [12]. 

However, if the mode is not available, the control law becomes nonlinear because of the dual 
effect [13, 14]. In this case, the optimal solution is computationally intractable due to the curse of 
dimensionality. Thus, research concentrates on approximate control laws. We distinguish between 
two classes of approaches: (i) approaches based on assumed separation and (ii) approaches based on 
structural assumptions. Approaches that belong to the first class approximate the involved conditional 
densities. By doing so, it is possible to establish separation. Then, the optimal control law consists of 
an estimator and a regulator whose gains are linear. The estimator is either based on an Interacting 
Multiple Model (IMM) algorithm [15] or on a Viterbi-like algorithm [16]. Approaches that belong 
to the second class make an assumption considering the control law, usually that the control law is 
linear such as in [6, 17, 8, 18]. 

Between full mode observation and no mode observation is the clustered mode observation. The 
term clustered can refer to (1) temporal interchange between full mode observation and no mode 
observation, and (2) observation of subsets of modes, i.e., observation whether one of the modes in a 
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subset is active or not. We will not review this field in our paper. We refer an interested reader to, 
e.g., [6, 19] and the references therein. 

In this paper, we take the approach (ii) and assume the controller to be linear and to possess 
constant regulator gain. Our approach differs from the works [6, 17, 8, 18] in the following way: 
[6, 17, 8] assume time-variant controller gains and [18] considers finite-horizon control with constant 
gains. And the works [6, 17, 8, 18] have to be implemented in a receding-horizon framework to be 
applicable for long operation times. The approach presented in [18] can be used to compute constant 
gains. However, in this case, the optimization horizon becomes a parameter that must be chosen 
sufficiently large in order to obtain an infinite-horizon control law. To obtain the controller gain for 
the approach presented in this paper, we minimize an inhnite-horizon cost function. By doing so, 
there is neither a need for choosing an optimization horizon, nor for implementing the control law in 
a receding-horizon framework. However, the latter can be done in order to, e.g., adapt the control law 
to changes in the system dynamics (both continuous- and discrete-valued), if desired. As we will see 
in the numerical example, the performance of the proposed controller, although it is time-invariant, 
can be almost identical to the performance of the receding-horizon time-variant controller from [8]. 

The dynamics of the discrete-time MJLS considered in this paper are given by 

^k+l = ^0k^k + ^0k^k + , ( 1 ) 

where G M” denotes the system state, Uj^ G M™' the control input, and G M"' the independent and 
identically distributed (i.i.d.) zero-mean second-order noise with covariance W = Here 

E {•} is the expectation operator and denotes the transpose of A. The matrices Ag^,, Bg and Uok 
are selected from time-invariant sets of matrices {Ai, • • • , Am}, M G N, etc. according to the jumping 
parameter 6k G {1,2, • • • , M} which is the state of a regular homogeneous Markov chain. We will refer 
to 6k as the mode. The regularity assumption guarantees that the limit distribution 6*oo of 6k exists [20]. 


The performance of the controlled system is measured by an infinite-horizon cost function 

K 


J = lim ^E 
K^oo K 


®j-I- ui^OkiLk 


( 2 ) 


.A:=0 


where for i G {1,..., M} the mode-dependent cost matrices Qj are positive semidefinite and Rj are 
positive definite, respectively. 


The task is to find a control law that minimizes (2). As mentioned above, the optimal nonlinear 
control law that solves this task is computationally intractable. Thus, we make a structural assumption 
and choose the control law to be linear, mode-independent, and constant, i.e.. 


—k — ^—k * 

With this control law assumption, the considered problem can be formulated as 

1 f ^ 

Ia:=0 

subject to = Ae^Xi^ -h HokMk > 

Xq , Oq , T . 

Outline. The remainder of the paper is organized as follows. Before we present the main result in 
Sec. 3, we introduce necessary definitions in the next section. A numerical example is given in Sec. 4 
and Sec. 5 concludes the paper. 
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2. Prerequisites 


Consider the MJLS 

®fc+i = (4) 

with Xj^ G M” being the system state, 6^ G {1,..., M} being the state of a regular, homogeneous 
Markov chain with transition matrix T, and Ag^ G {Ai ,..., Am}- 


Definition 1 (Mean Square Stability) 

System H) is mean square (MS) stable for any initial Xq and 6q, if it holds 


lim 

k^oo 



= 0 . 


Remark 1 If system (4) is affected by zero-mean second-order noise Wk such that 

®fc+i = ^OkXk + 

then the second moment E converges to a fixed point that is not 0, i.e., 

lim eIxi^xJ} = {Qi,...,Qm} , 

k—>-cx) r J 

where Qi are positive semidefinite. This claim can be shown using Banach’s fixed point theorem 
(see B). 

The following theorem provides necessary and sufficient conditions for MS stability. 

Theorem 1 For system (4), the two following conditions for mean square stability exist. 

a) System (4) is MS stable, if for any positive matrices Qi,..., Qm there exist positive definite 
matrices Pi,..., Pm such that 

M 

i=l 

where pij denotes the transition probabilities from mode i to mode j. 

b) System (4) is MS stable, if for the spectral radius p{ ) of the matrix 

M = (g) diag [(Ai (g) Ai) ... (Am (g* Am)] , (5) 

where (g) is the Kronecker product and diag the block diagonalization operator, it holds 

p{M)<l . 


Proof 1 The proofs are given in [21, 22, 23] for systems with real-valued state and in [24] for systems 
with complex-valued state. 

Next, we define MS stabilizability. 


Definition 2 (Mean Square Stabilizability) 

System 


®fc+l ~ ^OkXk + Eg,^Uk , 

with Xj. G M"" being the system state. Ok & {1,..., M} being the state of a regular, homogeneous Markov 
chain, and Ag^ G {Ai,...,Am}j G {Bi,...,Bm}, is linearly mean square (MS) stabilizable 
without mode observation, if there exists a matrix L such that 


is mean square stable. 


Xk+l — (A^j, + Bg^L)Xk 
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3. Main Result 


Before we present the necessary optimality conditions for (3), we define the second-moment system 
state 


= e[x,x110^=,} , 


where = 1 if 0^ = i and 0 otherwise. The dynamics of the second-moment system state are 


Xgi = Y,Pij (A* + B,L)xW(Ai + B,L)T + 


M 


i=l 


(*)/ 


'iT _L 2(di 


( 6 ) 


where 6j,' is the probability of being in mode i at time step k. 

Theorem 2 (Necessary Optimality Conditions) 

The necessary optimality conditions for the optimization problem (3) are given by 

(A, + B,L)Tp«(A, + B,L) + (Q, + L^R^L) - A« = 0 


M 


Y^p,^ [(A, + B,L)xW(A, + B,L)T + - X» = 0 

j=i 


M 

[(R, + Bj P«B,)LX« + Bj P«A,X« 


i=l 


= 0 


( 7 ) 

( 8 ) 

(9) 


(i) 

OO ' 


where xS = lim^^oo A^*^ G , A^^} are positive definite, and pS = J2jLiPijX 

Proof 2 The proof is given in A. 

Please observe that equations (7) constitute a set of coupled Riccati-like equations that reduce to 
the uncoupled Riccati equations if system (1) has only one mode. 

Finding a solution of (7)-(9) is not trivial. We propose to use a scheme similar to that presented 
in [25] or [26]. To this end, we first rewrite (9) using the vectorization operator as 

M \ / M \ 

[x« ® (R, + Bj P«B,)] vec (L) + vec B^P«A*X« = 0 . 


v*=i 


vi=l 


Solving for vec (L) yields 


/ M \ I / M \ 

vec (L) = - [X« ® (R, + Bj P«B,)] vec [Y A*X« , 




^2=1 


where A^ denotes the Moore-Penrose pseudoinverse of A. 
The numerical algorithm is the following. 


Step 1: Initialize {xl^p ..., x[|^^} and {Ari\ ..., Ar|^^} with random values and compute 


i-^rm ,. •., I- 
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Step 2: Compute 


/ m \ / 

vec (L[,+„) = - E ® (R- + BjPgB.)] vec Eb.^PwA.x[:} 


and reverse the vectorization operator in order to obtain i.e., 


L[fc+i] = devec (vec (L[fc+i])) 


with devec (vec (A)) = A. 


Step 5: Compute 


m 

^[£i] = [(A* + B.L[fc+i])x[;j(A* + + 0«H,WH7] , 

i=l 

■^[fc+l] = (Aj + + (Qj + Lj^_|_^jRjL[;i.+l]) , 

M 

i=l 

Stop if Xj^j, • • •, converged. Otherwise, return to Step 2. 

Remark 2 As in the case with i.i.d. system parameters considered in [25], convergence of the given 
algorithm does not always guarantee stability of the MJLS. Thus, it is always necessary to check if the 
computed control law stabilizes (1) using Theorem 1 with 

A^j, = Ae^ + B0j,L . 

Remark 3 In order to check whether a MJLS is MS-stabilizable, it is possible to use the procedure 
described in Appendix C. 

4. Numerical Example 

In order to demonstrate the performance of the proposed control approach, we performed Monte 
Carlo simulation runs with 100 time steps each for different system and noise parameters. For each 
run, we computed the control law using different random initial guesses. The evolution of the mode 
and the noise were also randomly generated for each run. For comparison, we used the optimal 
controller published by Chizeck et al. [10] that needs a mode feedback, and the finite-horizon controller 
without mode availability presented by Vargas et al. [8]. 

The constant parameters of the simulated MJLS were chosen to 


Al = 


1.2 1.2 

0 1 


Ao = 


R = M R = h 
’ ^ 1 ’ ^ 0.2 


Hi — I , H2 — I , Qi — I , Q2 — I , Ri — 1 , R2 — 1 . 


We considered two different noise scenarios with 


Wi = 0.01^ and W 2 = 0.5^ , 
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three different Markov chains with 



0.9 O.T 


0.7 0.3 


0.1 0.9' 

Ti = 

0.1 0.9 

II 

H 

0.6 0.4 

CO 

II 

0.3 0.7 


and two different initial states 

®o,i = [O O]"*" and ®o ,2 = [3 2]"'' . 

The spectral radii of the corresponding matrices constructed according to (5) are 
p (Ml) = 1.3295 , /9 (Ms) = 1.2970 , and p (M 3 ) = 1.1047 , 
which shows that the MJLS is unstable for each of the transition matrices Ti, Ts, and T 3 . 

Fig. 1 depicts the state trajectory, the applied control inputs, and the modes of the MJLS of an 
example run with Ti, Wi, and = [3 2] . Although the controller from [8] has time-variant 
gains while the proposed controller is time-invariant, the trajectories of both controllers are very similar. 

The results of the Monte Carlo simulation with Xq = [O O]"*^ are depicted in Fig. 2 and the 
corresponding mean values of the costs are given in Table 1. In this scenario, the performance of 
the proposed controller and the controller from [8] is only slightly worse than the performance of the 
optimal controller with mode observation. And the performance of the proposed controller and the 
controller from [8] is almost equal. For Xq = [3 2], the simulation results are depicted in Fig. 3 and 
the mean costs are given in Table 2. In this second scenario, the proposed control law performs well 
compared to the two other controllers if the noise covariance is large. However, the performance is 
worse if the noise covariance is low and the transition matrix is either Ti or T 3 . It is important to 
note that in contrast to the controller from [8], the proposed controller is precomputed offline and 
does not depend on the initial state x^ and the initial mode 6q. Thus, the computational footprint 
during operation is low. 




Wi 



W2 



Chizeck et al. 

proposed 

Vargas et al. 

Chizeck et al. 

proposed 

Vargas et al. 

Ti 

6.1005e-3 

6.7175e-3 

6.6756e-3 

15.2508 

16.7925 

16.6886 

T2 

5.2853e-3 

5.3018e-3 

5.2953e-3 

13.2122 

13.2535 

13.2373 

T3 

7.5863e-3 

8.1038e-3 

8.0839e-3 

18.9620 

20.2582 

20.2065 


Table 1 

: Mean costs of the three compared controllers for 

= [0 0]T. 




Wi 



W2 



Chizeck et al. 

proposed 

Vargas et al. 

Chizeck et al. 

proposed 

Vargas et al. 

Ti 

67.5311 

77.9765 

72.6012 

82.7204 

94.7306 

89.4974 

T2 

65.9251 

66.8358 

67.5779 

79.1335 

80.0799 

80.8079 

T3 

62.6784 

70.3547 

64.7367 

81.6350 

90.6844 

84.9624 


Table 2: Mean costs of the three compared controllers for ~ ■ 


An implementation of the presented control law is available at the CloudRunner homepage [27]. 
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0 10 20 30 40 50 60 70 80 90 100 

simulation time 


Figure 1: Example run of the three compared controllers with Ti, Wi, and = [3 2]^. 


5. Conclusion 

In this paper, we presented a method to compute a constant linear policy for inhnite-horizon 
optimal control of stochastic MJLS with state feedback but without mode observation. To this end, 
we have rewritten the MJLS dynamics in terms of the second moment, constructed the Hamiltonian, 
and proposed an iterative algorithm that minimizes the cost function. 

In the provided numerical example, the proposed control law has only slightly worse performance 
than the control laws from [8] and [10] although it is mode-independent, time-invariant, and can be 
precomputed offline. 

Future work will be concerned with derivation of convergence guarantees for the iterative algorithm 
and an extension of the proposed approach to measurement-feedback control. Furthermore, an 
assumption of a more complicated policy structures such as polynomials constitutes another possible 
research direction. 
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Figure 2: Results of the Monte Carlo simulation. Depicted are the costs of the three compared controllers for different 
transition matrices and noise covariances, and = [0 0]^. 


Appendix A. Proof of Theorem 2 

If the dynamics (1) is mean square stabilizable then the second-moment state converges to a fixed 

point = lim that is the unique solution of 
k^oo 


M 


= Epd (A* + B.L)X«(A, + B,L)T + 0«H,WH7 


i=l 


This claim is proven in B. 


Thus, the costs (2) are finite and can be rewritten as 

M 

J = ^ trace [(Qi + L'^RiL)X« 


2=1 


and the optimization problem (3) becomes 


min trace 

T 


(Qi + LTRiL)X« 


M 


subject to XW = Y,Pij (A* + B,L)X«(Ai + B.L)^ + 


2=1 


(A.l) 
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Figure 3: Results of the Monte Carlo simulation. Depicted are the costs of the three compared controllers for different 
transition matrices and noise covariances, and Kq = [3 2]^. 


Defining the positive semidefinite Lagrange multiplier A^, we obtain the Hamiltonian 7-i of (A.l) 
with 


M 

M 


% = trace 

(Q, + lTr,L)X« + 

(A, + B,L)xW(Ai + B,L)T + 

i=l 

i=i 





Differentiation with respect to A^\ and L yields (7)-(9). 


Appendix B. Proof of Second Moment Convergence 


According to Banach’s fixed point theorem, {X^ , ■ ■ ■ ,X^ } converges to the unique solution 

{X^\... , X^^} for k —>■ oo if (6) is a contraction mapping. To show that (6) is indeed a contraction 
mapping if (1) is MS-stabilizable, we define the vectorized second moment state vector 

T 


tk = 


vec X 


di) 


T 


vec X 


.(M) 


T 
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(B.l) 


where vec (•) denotes the vectorization operator. The dynamics of can be written as 

tk+i=^tk + Pk 

with M as in Theorem 1 and 

(TT®l)diag[^^i)(Bi®Bi) ... 

X vec (W)^ ... vec 

We need to show that 

where P G (0; 1) and 

tk= vec^Y^^ 

for any positive semidefinite {Y^^\ ..., Using Lemma 5 from [28], it holds 

Wtk+i = W^tk^ Pk~'^h~ Pk\\ = \\^^tk~tk^\\ = h'>^tk~ ^k^'^ 

< Amax||;0^ - 0^11 , 

where A m ax is the largest eigenvalue of M^M. Because for Amax < 1 (B-1) is a contraction mapping, 
it has a unique fixed point. 

Please note that the obtained result corresponds to the stability condition in Theorem l.b because 
Amax = P (M)^ holds. 

Appendix C. Stabilizability Test 

In order to determine whether the MJLS (1) is MS-stabilizable, we can solve the following 
optimization problem 

min /9(M) , (C.l) 

L 

where 

M = (^T"^ (g) diag Ai (g) Ai A 2 (g) A 2 • • • Am ® Am 

with 

A^ — Ai T B^L . 

If the solution yo(M) of (C.l) it holds p(M) < 1 then system (1) is MS-stabilizable and we can 
compute the optimal linear control law according to the numerical algorithm provided in Sec. 3. 
Fig. C.4 illustrates the spectral radii for the system from Sec. 4. It can be seen that the value function 
in (C.l) is convex in this scenario. 

However, the value function /o(M) in (C.l) is non-smooth. Thus, we propose to use the smooth 
convex approximation presented in [29]. The approximation replaces the spectral radius operator 
P (A) by 

^ /A\ 

I^e(A) = e^exp , (C.2) 
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Figure C.4: Spectral radii for the MJLS from Sec. 4. 


where e > 0 and Aj are the N eigenvalues of A. 

Using this approximation, we let e go from 1 to 0 and solve a sequence of optimization problems 

nun /te(M) . 

Because for the approximation (C.2) it holds 

lim /ie(A) = /9(A) , 

e^oo 

we recover the initial optimization problem (C.l) as e goes to zero. Additionally, we can use the 

gradient and the Hessian given in [29]. 
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